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Abstract 

The adaptation rule for Vector Quantization algorithms, and con- 
sequently the convergence of the generated sequence, depends on the 
existence and properties of a function called the energy function, de- 
fined on a topological manifold. Our aim is to investigate the condi- 
tions of existence of such a function for a class of algorithms exampli- 
fied by the initial "K- means" (Mac-Queen, 19671 and Kohonen algo- 



rithms (Kohonen, 1982| Kohonen, 19881. The results presented here 



supplement previous studies, including (|Tolat, 1990D, (|Erwin et al., 199'2l), 



(Cottrell et al., 1994D,(|Pages, 1993D and jCottreh et al., 1998D. Our 



work shows that the energy function is not always a potential but at 
least the uniform limit of a series of potential functions which we call a 
pseudo-potential. It also shows that a large number of existing vector 
quantization algorithms developed by the Artificial Neural Networks 
community fall into this category. The framework we define opens 
the way to study the convergence of all the corresponding adaptation 
rules at once, and a theorem gives promising insights in that direc- 
tion. We also demonstrate that the "K-means" energy function is a 
pseudo-potential but not a potential in general. Consequently, the 
energy function associated to the "Neural-Gas" is not a potential in 
general. 

Keywords 

Vector Quantization, K-means, Self-Organizing Maps, Neural-Gas, energy 
function, potential function, pseudo-potential 

1 Introduction 



In vector quantization theory ( |Gray and Neuhoff, 1998 1, a set of prototypes ^ 



w 



(wi, ...iWn) is placed on a manifold V C M*^, d > 1, in order to minimize 



"'^also called "codebook vectors", "reference vectors", "units" or "neurons". 



the following integral function, called the "energy function": 



Ev{w) 



1 " f 

-y^P{v)ipp{w,v){v — Wp^dv = / F{w,v)dv (1) 

p=l ' 

where P{v) indicates the probability density defined on V. We focus on the 
stochastic iterative approaches where at each time step, a datum v is drawn 
from the probability density function (pdf) P, and the prototypes w are 
adapted according to v using the adaptation rule: 

Awp = aipp{w, v){v — Wp) (2) 

where the adaptation step is tuned using the parameter a generally decreas- 
ing over the time {a is taken thereafter equal to 1 without restricting the 
general results), and ipp is a "neighborhood" function particular to each vec- 
tor quantization algorithm. Here we focus on discontinuous tpp functions. 

A main concern in the field of Vector Quantization, is to decide whether 
the adaptation rule (j21) corresponds or not to a stochastic gradient descent 
along the energy function (^, i.e. whether this energy function is or is not a 
potential onto the entire manifold V. On one hand, if the energy function is 
a potential then the convergence of the prototypes obeying their adaptation 
rule toward a minimum of this energy function is well established, in par- 
ticular in the stochastic optimization framework ( Robbins and Monro, lOSTj 



Albert and Gardner, 1967) with which this paper is concerned. For example. 



the energy function associated to the K- means algorithm ( Mac-Queen, 19671 



Ahalt et al., 19901, stochastic version of the LEG algorithm of Linde et al. 



( Linde et al., 1980 ) , is a potential as long as the pdf P is continuous ( Kohonen, 1991 



Pages, 19931 IGottrell et al., 1998D . 



On the other hand, if the energy function is not a potential, then very few 
is known about the convergence of the corresponding adaptation rule. For ex- 
ample, several results ( |Tolat, 1990 Erwin et al., 1992l|Heskes and Kappen, 1993 



[Heskes, 1999 ) have already shown that for a continuous density P, the corre- 



sponding vector adaptation rule of the Kohonen Self-Organizing Map (SOM) 



algorithm (Kohonen, 1982 1 (Kohonen, 19881 does not correspond to a stochas- 



tic gradient descent along a global energy function, and the convergence, al- 
though being observed in practice, turns out to be very difficult to prove, not 
to mention that most of the efforts have been carried out on the Kohonen 
rule dCottrell et al., 1994| [Cottrell et al., 19981 [Benaim et al., 1998 1. 



All the vector quantization algorithms we study in this paper are variants 
of the K-means algorithm as we will see in section |S1 We know these algo- 
rithms converge in practice toward acceptable value of their energy functions 
whenever they are proved to be associated or not to potentials. However, the 
theoretical study of their convergence is not available, so they remain largely 
heuristics. Among all these algorithms, the Neural-Gas ( |Martinetz and Schulten, 1994 1 



deserves a particular attention. It has been claimed by its authors to be 
associated to a global potential in general, hence to a converging adapta- 
tion rule. We propose a counter-example with a discontinuous pdf P which 
demonstrates that this claim is not true. This shows that the study of the 
convergence of all these algorithms is still in its infancy and motivates the 
present work. 

In this paper, we propose a framework which encompasses all these al- 
gorithms. We study this framework and we demonstrate that the energy 
function associated to these algorithms is not a potential in general. We also 
demonstrate that this energy function belongs to a broad class of functions 
which includes potential functions as a special case. The energy functions 
within this class are called "pseudo-potentials". The results we obtain do 
not depend on the continuity of the probability density function P, and give 
a first step toward an explanation why all the algorithms shown to belong 
to this framework succeed, in practice, in minimizing their associated energy 
function whether they are potentials or not. This framework should open up 
further avenues for a general study of the convergence properties of all the 
algorithms it contains at once. 

In section 2, we present the framework of this study. In section 3, we 
define a "pseudo-potential" function, which can be approximated by a series 



of potential functions: we define the concept of cellular manifold and this 
series of potentials. In section 4, we give the main theorem which states that 
an energy function of that framework is necessarily a pseudo-potential. We 
consider the K-means to show that pseudo-potentials are not always poten- 
tials. We discuss the consequence on the convergence of the corresponding 
adaptation rule. In section 5, we show that most of the common vector quan- 
tization algorithms belong to that framework. At last we conclude in section 
6. 

2 Framework 

We consider (R'', || . ||) is the euclidean d- dimensional space associated to 
the euclidean norm. Let D be a non-empty bounded set in M.'^. Let S be 
the diameter of D and V a topological manifold included in D. Let w = 
{wi, ...,Wn) be a set of prototypes in D. 

For p=l,...,n, the Voronoi' cell associated to Wp is usually defined as 



(Okabeet ah, 19921 



Vp = {veV\\/q = l,...,n \\v-Wp\\<\\v-Wg\\} (3) 

The set of Vp, p=l,...,n provides a cellular decomposition of V. 

For any /, the distance between wi and v is denoted di = \\wi — v\\. 

We will show in section |S] that the neighborhood function ipp of various 
algorithms is constructed on the basis of the Heaviside step function of the 
distances di, denoted H such that H{x < 0) = and H{x > 0) = L These 
step functions cause discontinuities of the corresponding energy functions or 
their derivatives, which appear at the Voronoi cells boundaries. This is the 
reason why we focused on the following class of neighborhood functions in 
the definition of our framework: 

M^,v) = M{Hid^-dl)U) (4) 

where (pp is a bounded function. 



We consider any probability density function P such that: fy P{y)dv = 1. 
In other words, all the results presented hereafter do not depend on the 
continuity of P. 

3 Cellular manifolds and pseudo-potential 

The discontinuities of the neighborhood functions ipp{w, v) occur onto the 
boundaries of the Voronoi cells. We shall consider a part of the manifold V 
called cellular manifold (and its complementary part called tubular manifold) 
which does not contain these boundaries to isolate them and to ease their 
study. This leads to the subsequent definition of pseudo-potential functions. 

3.1 The family of cellular manifolds V^ 

The cellular manifold is based on the Voronoi cells defined by the set of 
vectors Wp and which is arbitrarily close to the manifold V in the sense of 
the Lebesgue measure. 

Let 7] be a number -C 1 ; we denote T^{w) = T^ the open tubular neigh- 
borhood, of thickness i], of the boundary of Vp, included in V. 

This neighborhood is shown on figure ^ in R^. 

Then for a given w = {wi,...,Wn) with Wi G V, we define the cellular 
manifold V^{w) as the set of vectors of V which are not in the tubular 
neighborhood TI' for all p: 

n 

V^{w) = V\{[JT^) (5) 

p=i 

That means the smaller t], the closer to V the cellular manifold V^{w) 
which does not contain the boundaries of the Voronoi cells. In other words, 
V\V^ (w) " tends" towards the boundaries of the Voronoi cells while rj tends 
towards 0. V \ V(m;) is called tubular manifold. 

We can then state the following property: 



W^ being provided with the product-measure of Lebesgue, V^{w) verifies: 

meas{V \ V\w)) = 0{r]) (6) 

And we have in particular: 

\im{meas{V \ Viw))) = 
The proof of this property follows: 



measiV \ Viw)) = meas(|J(T;)) < J^'^^^^^^i^) 

p=i p=i 

We have to consider two subcases according to the dimension d: 
If d=l : the boundaries of Voronoi cells are points thus their measure is 
null. One has in this case: X]p=i ^^^'^■^(^p ) ^ (^ ~ 1)^) whence the result. 

If d>l : we have meas(T^) < r] . meas^d-i {boundary (Vp)) + 0(1]'^) where 
the residual term 0(1]'^) is bounded by the following sum: each term of the 
sum is the product of the measures of the (d-k) cells {k > 1) of the polyhedral 
decomposition of the boundary of Vp by the volumes of the k-balls of radius 
1] (i.e. J^fc '' ) . However D is bounded, therefore all the measures of the 
boundaries of Vp are finished, whence the result. 

3.2 Definition of a pseudo-potential 

In general, a potential is defined as a differentiable function of its variables. 
We define a wider class of functions that we call pseudo-potentials, which 
contains potential functions as a special case. Pseudo-potentials do not ver- 
ify in general the hypotheses of differentiability at every point but may be 
approached by a series of potential functions. Thus a potential is a pseudo- 
potential but the converse is false: a pseudo-potential is not necessarily dif- 
ferentiable everywhere and therefore is not necessarily a potential. 



Definition: let fi be a non empty and bounded set in M*^ and n > 1 
fixed. Tlie function E^: i7" — i> M is called pseudo-potential if there exists a 
family of potential functions E^: fi" ^ M, r] > 0, such that 

lim II En - El \\^= 

where || • ||oo denotes the norm of the uniform convergence. 

In our case, we focus on the energy function En{w) defined by (Hj). 

Introducing pseudo-potentials enables all these algorithms to be placed 
in the same framework (see sectional). In this framework, the neighborhood 
function belongs to the family defined in (^ and the associated energy func- 
tion may not be differentiable on the boundaries of the Voronoi cells, hence 
is possibly not a potential on the whole manifold V. 

Which leads us to the main result about the energy function Ey- 

4 The energy function Ey is a pseudo-potential 

We show that the energy function Ey defined in ((H) under the hypotheses 
of the section 2, may be considered as the limit of a series of differentiable 
functions over the manifold V, without being itself differentiable over V, i.e. 
Ey is a pseudo-potential. 

Theorem: The energy function Ey is a pseudo-potential 

Wiih\\Ey{w)-El.{w)\\^=0{7^). 
The first part of the theorem means Ey is not necessarily a potential over 

V , being not always differentiable on the boundaries of the Voronoi cells. 

The second part means that the difference between the energy Ey{'w) 

and the energy Ey{w) both defined on the whole V , is bounded by a value 

proportional to r], hence as small as wanted. In other words, even if Ey{w) 

is not a potential, it is very close to be one. 



4.1 Proof of the theorem 

To prove that Ey is a pseudo-potential, we consider for 1] > 0, the functions 
Ey defined as: 

\/weV'', E^:w — > f F{w, v)dv 

Then we first show that these functions which are defined on the whole 
manifold V , are differentiable on \/''(w) {i.e. the domain where the integral 
is carried out). And second, we show that the difference || Eyiw) — Ey{'w) ||oo 
equals 0{ri), hence that lim^^o II Eq — E^ ||oo= 0, fulfilling the conditions 
necessary for Ey to be a pseudo-potential. 

The proof of the first part of the theorem rests on the behavior of the 
functions ipp. When the current v are far enough from the boundaries of 
the Voronoi cells, these functions behave like constants, while onto these 
boundaries they have discontinuities. We need to insure the differentiability 
according to w of the F{w_, v) functions which depend on the ipp functions, 
and to control the integration domain V^{w) when w varies. This is the pur- 
pose of the two propositions which follow, to show that when the variation ( 
of w_ remains lower than a given bound, the variations of F{w, v) (Proposi- 
tion 1) and that of the integration domain V^ (Proposition 2) are negligible 
compared to the norm of (. 

4.1.1 Invariance of the ipp functions 

This proposition insures the invariance of the ipp{w,v) functions for v be- 
longing to V^{w) and for sufficiently small variations ( of the prototypes 
w. 



Proposition 1: For C = (Ci, •••, Cn) e {'^'^T with w + C = {wi + Ci,...,Wn + 
Cn) £ y"', we denote | ( \= maxp=i^„ || (p ||, and d'^ =\\ {wr + Cr) —v \\. Thus, 



we have: 

3 z/ > such that Vr,p = 1, ...,n for \ ( \< u : 

H{{4f - (4)2) = H[dl - dl),Wv e V^{w) (7) 

Proof : 

The proof is based on the existence of a bound denoted z/ inside which 
the invariance of the Heaviside function according to ( is insured. First, we 
consider the case where the Heaviside function takes the value 1 and then 
the case where it is 0: 

(i) Considering H{d'^ — dp) = l, we must find a condition on ( for which 

(4)^-(4)'>0: 

For r ^ p, we have d"^ — dp > rf and 

(4)2 _ (4)2 = 4 - 4 + 2{wr - V I C) - 2{wp - V I Q + C - Cp' (8) 

where (. | .) denotes the scalar product. However, for the scalar products, 
we have: 



{Wr — f I Cr) > ~25||Cr| 



and 



hence 



{v-w^,\Q>-25\\C, 



p\ 



{dif - (4)^ >rf - 45(||C.|| + IICpII) + c + cl 
Finally, we have: 

{dif - {dlf > (ir^2 - e^ - 46 II C II) + ilv' + C' - 45 II C. II) 
= ai + a2 

(ii) Considering H{d'^ — dp)=0, we must find a condition on ( for which 
(4)2-(4)^>0: 

10 



A similar calculation leads to: 

(dir - {dif > i^-v' + d - 45 II Cp II) + {\ri' - C - 45 II C. II) 
= h + b2 

The joint study of the polynomials in || (p \\ defined by Oi and 61 shows 
that the conditions ai > et 61 > are reached for: 

2 
||Cpll< fi = {46' + ^t^-25 

Moreover, 02 > (|?7^ - 45 || C ||) and 62 > (|^^ - 55 || C- ||) 

2 

The conditions 02 > et 62 ^ are met for || Cr || ^ iqs- 

It is enough to take u = min{/i, ^}- 

The proposition 1 means that considering a variation of the norm of 
w vectors lower than u, ipp functions remain the same either within the 
energy function or within the adaptation rule. As a consequence, the function 
F{w, v) to integrate, which is a combination of ipp functions with continuous 
functions of w, is continuous and differentiable over V^{w) according to w. 
The nature of P as being continuous or not, does not affect this result because 
P does not depend on w_. 

4.1.2 Variations of the integration domain 

To study the variations of the energy function, it is necessary to study the 
variations of the integration domains. 

This proposition insures that the variations of the integration domains 
V^{w) and V\V^{w_) remain small with small variations ( of the prototypes. 



Proposition 2: for | (^ |^ 1, we have : 

(i) I meas{V \ V^iw + Q) - mea.s{V \ Viw)) |= (9(| C P) 

(ii) I meas{V'^{w + Q) - meas(\/''(w)) |= 0(| C H- 
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Proof : The proof of both equations is obtained by calculating the mea- 
sure of the tubular neighborhood T^{w) of the Voronoi cells. The projections 
of these neighborhoods onto the coordinate axes verify : 
I meas{T^{w + C) - meas{T^{w) \= 0{\ C f) 

In just the same way as in property (jHI), we can write: 

I meas{V \ Viw + C)) - meas{V \ Viw)) \ 

n 

= J2{meas{T^{w + 0) - meas{T^{w))) = 0{\ ( H 

validating item (i) of the proposition. 
Item (ii) is validated observing that: 

meas{V'^{w)) + meas{V\V'^{w)) = meas{V'^{w + C)) + meas{V\V\w + C)) 

Hence, for small variations ( of w, the variations of the integration do- 
mains remain negligible compared to (. 

4.1.3 Last step for the proof 

We show that small variations ( oiw_ {i. e. less than the bound u determined in 
Proposition 1) lead to a small variation of Ey which breaks down in a linear 
application plus other terms of higher order, hence that Ey is a potential 
for aW w E V and all v G V^{w). Then we show that Ev{w) — Ey{w) = 
0{ri), Ww G ¥"• hence that lim^_,o || Ey — Ey ||oo= demonstrating that Ey 
is a pseudo-potential, and at the same time that || Ev{w}—Ey{w_) ||oo= 0{ri). 
The difference Ey[w + () — Ey{w) may be written as: 

E^w + O-Kim) 



[EUw + 0- / F{w + (:,v)dv] + [/ F{w + (,v)dv-E'^{w)] 
[part 1] -|- [part 2] 
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The function F{w,v) being bounded on V^ x V, Proposition 2 shows that 

[parti] is 0(1 CP). 
Proposition 1 leads to: 



\part2] = — {(j I / P{v)ipj{w,v){v — Wj)dv) 

JV^(w) 



II C' IP /" 
H ^ — / Piv)'ipj{w, v){v — WjYdv 

The first term is of the form L{Q, where L is a hnear apphcation and the 
second term is of higher order. Thus, we can write: Ey{w_ + C) ~ Ey{w) = 
L{Q + 0(|Cn, which means that Ey{w) is differentiate for aA\ w & V and 

Moreover, because the function F{w, v) is bounded on V"' x V, we can 
write: 



Eviw) - E'^iw) = F{w, v)dv 

Jv\V^(w) 



y\y(w) 

< sup F(w,v).meas{V XV^) 

{w,v)eV"xV 

whence, with the property © : Ey^w) — Ey{w_) = 0{ri), for all w G V^"". 
The energy function Ey is then a pseudo-potential. 

4.2 A pseudo-potential is not a potential in general 

As far as the neighborhood functions tpp are of the form given in (j3)), the 
theorem ensures that the corresponding energy function is a pseudo-potential 
over the entire domain V, and at least a potential over V^{w). We also know 
that it exists energy functions in this framework [i.e. pseudo-potentials) 
which are potential over the entire domain V for continuous pdf P, e.g. the 
energy function of the K- means (see section I^TTj) ( |Pages, 1993D . However, it 
remains to prove the existence of energy functions in this framework which 
are not potential over the entire domain V, i.e. the existence of pseudo- 
potentials which are not potentials. 

13 



Here we show that the energy function of the K-nieans does not corre- 
spond to a global potential for a particular discontinuous pdf P, hence is not 
a potential in general for all P. 

In order to simplify the calculi, we consider only n = 2 prototypes w = 
{wi,W2) in a 1-dimensional space {d = 1). It is straightforward, though 
messy, to extend this result to higher dimensions and greater number of 
prototypes . 

The neighborhood function of the K-means, associated to each prototype 
is defined as: 



,{w,v)=H{dl-dl)=Hi\\w2-vf- 


Wi-V ^ 


2(w, v) = H{dj-dl) =H{\wi-vf - 


W2~V 1 ^ 



that we shorten ■ipi{v) and ip2{v) respectively. We have ipi{v G V^) = 1 
and iJii^v ^ Vi) = 0. 

These functions are part of the family given by equation (j3)), hence the 
corresponding energy function Ey is a pseudo-potential and Ey is a potential. 
Observing that Ey = {Ey — Ey) + Ey, we are going to show that Ey is not 
a potential by showing that {Ey — Ey) is not a potential. The function 
{Ey — Ey) is not a potential wrt w iff the variation of this function wrt some 
variation ( oi w cannot be written as -^(C) + O(C^), i. e. as a linear form of (. 

Let {wi + W2)/2 be the origin of the directed line (^1^2)- For a small 
positive variation ( = d oi wi (see figure |21), with < d < r], we have: 

A(C)= {Ey-E'y){w + C)-{Ey-E^y){w) 

= / F{w + (,v)dv — / F{w,v)dv 

If r 1 ('^ 

= 7J / nv) 4\M\v) + ^i\v)52{v) dv 

-\ I P{v) H'i{v)5^{v) + i^2{v)62{v)] dv 

^ Jv\V^{w) 
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where 



Si{v) = d^ = \\Wi — v\y. 
5\'{v) = ||wi+Ci-t;f, 
(52(f) = d\= \\w2 — v\\^. 



and 



4\v)=H{6Hv)-62{v)). 

The domains V \V^{w + () and V \ V^i^w) are defined on the figure |21 
and given below: 

V\V^{w + C) = [P2,0]U[0,Ps]U[Ps,P,]U[P,,P,] 

V\V%W) = [P,,P2]U[P2,0]U[0,Ps]U[P3,P,] 



where 



Pi = -r//2 

P2 = -r?/2 + Ci/2 

P3 = Ci/2 

Pi = r//2 

P5 = W2 + Ci/2 



and < Ci < ?7 leads to Pi < P2 < < P3 < Pi < P5. 

Let us consider a particular uniform density P{v) defined as: 



P(v) = ; ^ = ^ ^^ ^' ^ [^' ^] "^'^^ ^ ^ [°' ^4] and /3 » r] 
else 
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Notice that P is a discontinuous pdf at A and p. We have also < Ci < 
7] « j3 hence P5 << (3. Then, for such a density and from (jU)) we get: 



A(C) 



P5 rPi 

F{w_ + (,v)dv — / F{w,v)dv 
P2 Jpi 



p 

2 -/A 



\6i'{v) - 62{v))dv + ^ r Hv)dv if Ci > 2A {i.e. X < P3) 

^ / ' S2iv)dviiCi<2\ (z.e. A>P3) 
Developping equation (|TT| leads to: 



:ii) 



f p 



A(C) 



^[(^X~W2f-iX-w,f+wl-wl] 



+ 



P 



P. 



'2{w,-X)' + {w2-'^r~{wl+wl)] Ci+o(Ci') if Ci>2A 



V 



^2-7^)Xi+o(Ci)ifCi<2A 



^i(Ci) + o(CnifCi>2A 

i^2(Ci) + o(Ci)ifCi<2A 

(12) 

with Li 7^ L2. Therefore A(C) is not a linear form of ( which proves 

the non differentiability of {Ey — Ey). Hence {Ey — Ey) is not a potential 

and so, the energy function Ey is a pseudo-potential but not a potential in 

general. 

4.3 What is important about this result 

Consequence 1: The family of pseudo-potential functions includes potential 
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functions as a special case and it exists pseudo-potential functions which are 
not potentials. 

Consequence 2: The previous example shows that a necessary condition 
for the energy function of the K-means to be a potential is that given any 
number n, position w_ and dimension d of the prototypes, the boundary of 
Voronoi cells never crosses any discontinuity of the pdf P whatever the value 
of the variation (. A sufficient condition for this to hold is P being continuous. 

Consequence 3: The energy function of the K-means is not a poten- 
tial at least for some discontinuous pdf P. This complements the result of 



Pages dPages, 1993D stating this energy function is a potential for continuous 
P. Moreover, the algorithms presented in section because they reduce to 
the K-means for specific values of their parameters, also share this property 
that prevent them from being potentials in general for all P and all set- 
ting of their parameters. In particular, this result holds for the Neural-Gas 



( Martinetz et al., 1993D despite the claim of its authors: the Neural-Gas is 



not a global potential at least for discontinuous P and width a of the neigh- 
borhood function set to 0. This casts some doubt on the validity of their 
proof which do not specify any restriction on P and a. As a consequence, 
the convergence of the associated adaptation rule in general still to be proved. 

4.4 Consequence of the theorem concerning the con- 
vergence 

The consequence of the theorem is promising concerning the eventual con- 
vergence of the adaptation rules associated to pseudo-potentials toward a 
local minimum. Indeed, from a mathematical point of view, talking about 
"derivatives" of the energy function Ev{w) onto V according to some Wp 
does not make any sense because of the discontinuities of this function onto 
the Voronoi boundaries. The only possibility is to measure the variations 
of this function according to a small movement of the prototypes. We have 
already shown that the volume of the tubular neighborhood of the Voronoi 
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boundaries is in 0{ri) (Equation (jH))) so is bounded. Now this theorem shows 
that the variations of the energy function according to a bounded movement 
( of the prototypes, are also bounded. 

Indeed, the theorem allows to write that Ey {w+()—Ey{w+() =0{ri) and 
Ev{w)-EUw) = 0{v) so [Ev{w + 0-Ev{w)]-[EUw + 0-EUw)]=O{v). 
And Ey being a potential, then Ey{w+C)—Ey(w) is bounded as a linear form 
of ( which is bounded. Therefore Ey{w+()—Ev{w) = Av{C) is also bounded 
although Ey is not a potential on V. As a consequence, the effects of the 
variation Ay ((^) of the energy function Ey according to (, on the dynamic of 
the prototypes remains negligible on average even for some data falling onto 
the Voronoi boundaries. In other words, the existence of a pseudo-potential 
for lack of a potential would be sufficient to ensure the convergence of the 
associated adaptation rule, although a rigorous proof is still to be carried 
out. The work of Bottou ( Bottou, 1991 ) gives also insights in this direction 
but following a different way. 

5 Consequence for existing rules 

In this section, we show that the neighborhood function of a large number 
of algorithms can be written in the form of the equation ^, i.e. as a com- 
bination of Heaviside step functions of a difference of squared distances di. 
This demonstrates that the corresponding adaptation rule is associated to 
an energy function which is not necessarily a potential but at least a pseudo- 
potential. 

5.1 K- means vector quantizer 

The K-means vector quantizer ( |Mac- Queen, 19671 ) is the iterative version 
of the Linde-Buzzo-Gray batch learning technique for vector quantization 



( Linde et al., 1980 ). It consists in presenting one datum f at a time, then 
selecting the closest prototype Wp* to it and moving it toward v. The corre- 



spending neighborhood function can be written as: 



^[K-means]^^^ ^^ ^ ^^^^^ ^^ ^ ^^^^^ ^^ H^^'^^' ^)(^(^ " p) - 1) + 1) 



/=1 



1 if f G V^ and p = min({i G (1, • • • ,n)\Ki{w, v) = 1}) 
else 

(13) 



where the function Kp is an indicator function of the Voronoi cell Vp of 
Wp, defined as: 



Kp{w,v)= l[H{dl-dl) 



k=l 



1 if t; G V^ 
else 



(14) 



The function Ap performs an additional sort over the index of the closest 
prototypes (the "winners") for which Kp is equal to 1, i.e. all the prototypes 
which are the closest to v. This is the algebraic writing of the algorithms 
which choose only one prototype among all the closest one in case of equality. 
Here, the choice is carried out according to the lowest index, it could be the 
highest one, or a random choice among the indices of all the winners. In case 
where all the winners are moved, then ^j, ~'^'^^^^' {w, v) = Kp{w,v) should be 
considered. 

The K-means algorithm corresponds to a Hard Competitive Learning 
technique ( |Ahalt et al., 1990) ), where only the closest prototype to the datum 
is adapted at a time. To escape from local optima of the energy function, 
it has been improved by defining a neighborhood function which enables 
the winner to be adapted and also some of its neighbors. All the follow- 
ing algorithms belong to that class of Soft-Competitive Learning techniques 
(Ahalt et al., 19901, and each one defines its particular neighborhood func- 



tion. 



19 



5.2 Self-Organizing Maps and other graph-based neigh- 
borhoods 

The Self- Organizing Map (SOM) proposed by Kohonen ( |Kohonen, 1982D de- 
fines a set of connections between the prototypes, which corresponds to a 
graph G with a particular topology {e.g. a regular 2- dimensional grid). The 
winner being determined according to the datum v, the neighborhood func- 
tion consists in weighting the adaptation step of the prototypes according to 
their closeness to the winner on the graph G. 

The corresponding neighborhood function may be written as: 

n 

^f''''Kw,v) = Y,Mm,v)KiD,p{G)) (15) 

9=1 

where h^ is a non-increasing positive function with a tunable width a 
{e.g. h„{u) = e~^) and Dab{G) is the distance between Wa and wi, in terms 
of the lowest number of edges separating them within the graph G. 

Several other algorithms essentially differ from SOM by the fact they use 
a graph whose topology is not defined a priori but thanks to the data and 
the prototypes positions in the data space. This is the case in the Growing 



Neural-Gas (GNG) of (Fritzke, 1995b I, where G is the Induced Delaunay Tri- 



angulation (IDT) ( |Martinetz and Schulten, 1994 ), in ( Kangas et al., 19*90 ) 



with Minimum Spanning Trees (MST), in ( |Mou and Yeung, 1994 ) with Gabriel 



Graphs, in the Growing Cell Structure (GCS) of (Fritzke, 1994 1 with a set of 



simplices with fixed dimension, and in the growing versions of SOM (GSOM) 
of dPritzke, 1995a I and ( Villmann and Bauer, 1997| ) with an adaptive grid 



structure. As far as n remains constant and the graph G remains the same, 
the neighborhood function of all these models is identical to the one of the 
SOM written above, and belongs to the framework we consider in this paper. 
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5.3 Neural- Gas 

In the Neural-Gas (Martinetz et al., 19931, the prototypes are ranked in in- 



creasing order of their distance to the datum v. This rank is used to weight 
the adaptation rule of the prototypes. Martinetz et. al. give the correspond- 
ing neighborhood function : 

n 

^[Neural-Ga.](^^ t;) = K{k,{w, v)) with k,{w, v) = J^ ^^^ " ^?) (16) 

where T(n) = l—H{—u), Vm. The function fcp is the rank of the prototype Wp 
such that kp{w, v) =j—l iff p is the j closest vector to v (several prototypes 
may have the same rank). Note that the Neural-Gas could be included into 
the previous family of adaptive graph-based neighborhoods considering G as 
the graph which connects the n-nearest-neighbors of v among w_, in a chain 
where the i^'^ nearest neighbor is connected to the (i — 1)*'^ (V? > 1) and the 
{i + iy" (yi<n). 

5.4 Recruiting rules 

One of us proposed the "Recruiting" Neural-Gas (|Aupetit, 2000D as a way to 



cope with function approximation tasks using vector quantizers. A recruiting 
factor is added to the Neural-Gas adaptation rule. Such a factor is associated 
to each prototype and the winner imposes its own on the others. This tends to 
gather the prototypes around the one which has the highest recruiting factor. 
Then setting this factor proportional to the local output error approximating 
a function, enables more prototypes to be grouped together in areas of the 
input space where the corresponding output function is more difficult to 
approximate. This tends to decrease the global approximation error. 
The corresponding neighborhood function may be written as: 

n 
9=1 
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where Vg, e^ G [0, 1]. Taking e^ = e^ = 1, Vp, q leads to the usual Neural- 
Gas. 

Goppert and Rosenstiel ( Goppert and Rosenstiel, 2000D proposed a sim- 



ilar approach with a SOM for which each prototype defines its own neigh- 
borhood's width (Tq tuned according to the local output approximation error. 
The corresponding neighborhood function may be written as: 

n 
^[RecruitingSOM] (^^ y) = J^ A^{w, v)K^ {D^,{G)) (18) 

q=l 

where Vg, (Xq G [0, 1]. Taking <7q = (Xp = a, Wp, q leads to the usual SOM. 

In both approaches, as far as e^ and a^ remain independent of v and 
w, the corresponding neighborhood function belongs to the framework we 
consider in this paper. 

5.5 Concerning the algorithms with adaptive struc- 
tures 

We have shown that many vector quantization algorithms belong to our 
framework. However, considering dynamic approaches such as the algorithms 
which adapt either the number n of prototypes (GCS, GNG, GSOM), the 
graph of their neighborhood structure (GNG, GSOM), or the recruiting fac- 
tor (RecruitingNG, RecruitingSOM), according to either the number of it- 
erations, the position of the prototypes or the output approximation error, 
it is still difficult to define a framework taking into account these structural 
changes. That is why we considered these dynamic parameters to be fixed 
in such cases. 

5.6 About some algorithms which do not belong to the 
present framework 

We shall notice that the modified Self- Organizing Map proposed by Heskes 
and Kappen ( Heskes and Kappen, 1993 Heskes, 1999| ) does not belong to 
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the present framework. Indeed the Heaviside step functions involved in the 
corresponding ipp neighborhood functions are not apphed to a pair of square 
distances di directly, but to a sum over w of weighted square distances di. 
This prevents ipp from belonging to the family we consider in equation (j3)). 
However it seems possible to enlarge our framework in order to encompass 
the neighborhood function they proposed. 

The 7-Observable Neighborhood has been proposed by one of us (|Aupetit et al., 2002 ) 



as a neighborhood that decreases the number of iterations needed for the 
adaptation rule to converge toward an optimum of the energy function. The 
corresponding neighborhood function does not belong to the present frame- 
work. However, we have already defined an extension of this framework 
which encompasses this adaptation rule and thus which allows to demon- 
strate that the energy function associated to the 7-Observable Neighbors is 
also a pseudo-potential. This work has not been published yet. 

6 Conclusion 

In vector quantization, we propose a framework which ensures the existence 
of a family of potential functions {i.e. differentiable functions) which con- 
verges uniformly to the energy function that we call in such a case a "pseudo- 
potential" . We demonstrate that a pseudo-potential is not necessarily differ- 
entiable everywhere, hence it is not always a potential. As a consequence, 
the corresponding adaptation rule does not necessarily perform a stochastic 
gradient descent along this energy function. 

We also show how a large number of existing vector quantization algo- 
rithms belong to this framework, hence even if they are not associated to 
potentials, they are at least associated to pseudo-potentials. This framework 
allows to study at once the convergence of all these algorithms. At that 
point, although the pseudo-potentials are not necessarily potentials, a conse- 
quence of the theorem shows that the variations of the pseudo-potentials on 
the boundaries of the Voronoi cells remain bounded, so they have a negligi- 
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ble effect on the dynamic of the prototypes on average. This is a promising 
preliminary resuh about the convergence of the corresponding adaptation 
rules. 

If the convergence of the adaptation rules associated to pseudo-potentials 
were demonstrated then the present framework would constitute an a pos- 
teriori justification of a large family of adaptation rules considered up to 
now as heuristic. Moreover, this framework makes possible the design of new 
adaptation rules respecting the hypotheses which ensure the existence of the 
corresponding pseudo-potential. 

The results of this paper suggest two avenues for future research: 

• investigating the convergence properties of the adaptation rules associ- 
ated to pseudo-potentials in general. 

• extending this framework to a wider class of neighborhood functions. 

By introducing pseudo-potentials, we add a new concrete framework on 
the wasteland of non-potentials. Within this framework, the consequence of 
the theorem makes us hopeful to build new theorems which could insure at 
once the convergence with respect to a specific norm, of a large number of 
existing vector quantization algorithms which are not associated to potentials 
but at least to pseudo-potentials. 
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\AV^(w) 




Figure 1: Cellular and tubular manifolds. The circles are the prototypes 
w. We define tubular manifolds V \ V^i^w) of thickness r] (dotted lines) 
which contain the boundaries of the Voronoi cells (plain lines), and cellular 
manifolds denoted V^{w) complementary to the tubular manifolds. 
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Figure 2: Variation of the integration domain with C^i . The point O 
is the Voronoi boundary between Wi and W2- The point P3 is the Voronoi 
boundary between Wi + Ci and ^2- 
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